Factors associated with low school readiness, a linked health and education data study in Wales, UK

Background School readiness is a measure of a child’s cognitive, social, and emotional readiness to begin formal schooling. Children with low school readiness need additional support from schools for learning, developing required social and academic skills, and catching-up with their school-ready peers. This study aims to identify the most significant risk factors associated with low school readiness using linked routine data for children in Wales. Method This was a longitudinal cohort study using linked data. The cohort comprises of children who completed the Foundation Phase assessment between 2012 and 2018. Individuals were identified by linking Welsh Demographic Service and Pre16 Education Attainment datasets. School readiness was assessed via the binary outcome of the Foundation Phase assessment (achieved/not achieved). This study used multivariable logistic regression model and a decision tree to identify and weight the most important risk factors associated with low school readiness. Results In order of importance, logistic regression identified maternal learning difficulties (adjusted odds ratio 5.35(95% confidence interval 3.97–7.22)), childhood epilepsy (2.95(2.39–3.66)), very low birth weight (2.24(1.86–2.70), being a boy (2.11(2.04–2.19)), being on free school meals (1.85(1.78–1.93)), living in the most deprived areas (1.67(1.57–1.77)), maternal death (1.47(1.09–1.98)), and maternal diabetes (1.46(1.23–1.78)) as factors associated with low school readiness. Using a decision tree, eligibility for free school meals, being a boy, absence/low attendance at school, being born late in the academic year, being a low birthweight child, and not being breastfed were factors which were associated with low school readiness. Conclusion This work suggests that public health interventions focusing on children who are: boys, living in deprived areas, have poor early years attendance, have parents with learning difficulties, have parents with an illness or have illnesses themselves, would make the most difference to school readiness in the population.


Background
Early childhood education shapes the direction of a child's development, enhances their ability to learn in the school environment and strengthens their foundation for lifelong learning [1,2].School readiness encompasses cognitive, social, and emotional aspects and indicates if a child can achieve at an appropriate level in formal school.School readiness is also a determinant of health and wellbeing over the life course [3,4].It is strongly linked to the pre-school environment, and it indicates the acquisition of the necessary social skills, emotional skills, knowledge, and attitude to effectively engage and learn in school.School readiness is defined by a child's physical well-being and motor development (e.g., co-ordination, fine motor-skills), social and emotional development (co-operation, empathy, and the ability to express their emotion), approaches towards learning (enthusiasm, curiosity, temperament), language and communication (listening and speaking), basic knowledge (essential vocabulary and numbers) and cognitive skills (problem solving) [4].
A review of published literature on the risk factors associated with school readiness indicates that area-level characteristics, parental demography, and parental and child health conditions play a significant role in school readiness.Factors associated with higher school readiness include higher levels of child care provision in the area where the child is brought up [5,6], living in private housing [7], the mother's age (between late twenties or thirties) [7,8], breastfeeding (higher rates and longer duration) [7,9], dual parent households, a nurturing parenting style [7,10] and parents with good physical [10,11] and mental health [7,9,10].Similarly, good physical health of the child (being born at term and a healthy birth weight) [12,13] is also associated with higher school readiness.Conversely, low access to childcare, higher levels of unemployment (area and family level), living in social housing, exposure to poor environment such as damp, maternal heavy drinking behaviours [14], mother who smoked during pregnancy [5,12], teenage mothers or older mothers (35+ years) and parents with poor physical health (hypertension, diabetes), poor mental health, single parent or step-parent families, low expectations by the parent for the child, preterm or low birth weight child, and poor health of the child are also associated with low school readiness [6,7].An Australian data linkage study conducted by Chittleborough et al, identified a group of predictors (such as maternal age, smoking during pregnancy, parity, marital status, and both parents' occupation and gender) which were capable to identify the children at risk of developmental vulnerability at school entry [15].
Since being school ready is associated with many positive outcomes, improving school readiness is a necessary strategy for economic development and social mobility [16].If children are not school ready, it can take many years for them to catch up with their peers, if ever, [17,18] and therefore contribute to widening inequalities.School readiness has been identified as a key public health concern in a recent review of UK public health systems and policy approaches to early child development [19].It is very challenging to identify the right individuals (children and families) who are at risk in order to provide the necessary support [20].
Therefore, identifying the most significant risk factors is a priority in closing the gap in children's school readiness and improving outcomes for children.Studies have shown that routine data obtained during a child's birth can help to identify the children and the families at risk of poor development [21,22].A framework using routinely collected administrative data can inform the appropriate supporting agencies to provide adequate help and support to the most vulnerable of the society.

Objective
The aim of the study is to identify and weight the most significant risk factors of low school readiness using linked routine data for children in Wales.This work also examined the risk factors which were clustered together and build a vulnerability profile of the children who are at risk of low school readiness.The factors which are associated with school readiness are examined using; a) traditional statistical methods (multivariable logistic regression model, to observe the highly associated risk factors) and b) data driven supervised machine learning classification algorithm (decision tree, to measure the commonly observed and prevalent risk factors at the population level).In a logistic regression model, the log-odds for low school readiness as a linear combination of explanatory variables and confounders have been investigated.On the other hand, the decision tree model, based on recursive partitioning, highlights the statistically significant hierarchically clustered features for low school readiness and captures complex relationship between the risk factors and low school readiness.The hierarchically clustered features from the decision tree and the risk factors identified from the logistic model are important to cross-validate the set of overlapping risk factors and serves to strengthen the importance of the findings.These risk factors will inform the development of a profiling model by identifying the socio-economic and physical and mental health barriers that the child and/or their families face, which may impact the child's ability to meet the developmental milestones necessary to progress effectively through the early years.The primary focus of the model is to build a holistic understanding of the most significant risk factors of the low school readiness which can inform the necessary support system required for the individuals and families at highest need and make more efficient use of the resources.

Sample selection and data linkage
In this cohort study, the study population was derived by linking the Welsh Demographic Service (WDS) dataset (administrative dataset about individuals in Wales that use NHS services) and the Pre16 Education Attainment dataset (individual-level administrative data relating to the education system in Wales).The study population consists of children who completed Foundation Phase (a statutory curriculum for children aged 3-7 years) [23] between 2012 and 2018.The data linkage was done using an encrypted key known as Anonymised Linking Fields (ALF) in the Secure Anonymised Information Linkage (SAIL) Databank [24,25].Residential Anonymised Linking Fields (RALFs) are an encrypted residential address available in WDS dataset, which is also linked with a smaller geographical unit known as lower super output area (LSOA).Using ALFs, RALFs and LSOA the study population were anonymously linked with the individuals living with the child in the same household during the child's Foundation Phase [26].Children without valid and continuous RALFs and primary care records in Welsh Longitudinal General Practice (WLGP) dataset in SAIL until their completion of Foundation Phase were not included in the study to ensure the complete coverage of exposure and outcome data during the study period.The study population was linked with the National Community Child Health Database (NCCHD) to obtain birth and maternal records during childbirth.Records with missing maternal identifiers and mothers with no primary care record in WLGP dataset were not included in the study.The flow diagram of the selection of the study population is presented in Fig 1.

Risk factors from routine data
The selection of risk factors associated with low school readiness has been informed by the literature review undertaken at the inception of the study.The risk factors had been selected from the routinely collected electronic administrative and health datasets and this provided the framework upon which the current study was developed.The literature review focused on observational studies including case controls, cohort studies and studies using linked routine data with the primary or secondary outcomes examining school readiness.Depending on the strength of association (Odds Ratio) between the risk factors and low school readiness a list of risk factors were prepared, and their analogous variables were created or selected from linked routine data.The literature review to select the risk factors and how these were mapped with routine data, have been described in a Supplementary document (Appendix 1 in S1 File and Appendix 2 in S1 File).General demography and birth-related variables including gender, gestational age, birth weight, breastfeeding, mode of delivery (caesarean section/assisted delivery/natural delivery) and maternal age at childbirth were obtained from NCCHD.The multiple birth (singleton/non-singleton) flag was derived using week of birth of the child, encrypted maternal identifier and the birth order of the children.To identify the children who lost their mother before the Foundation Phase, a binary variable was derived.Maternal physical health (diabetes, cancer, anaemia, hypertension, learning difficulty) and mental health (depression, anxiety, serious mental illness, medication related to anxiety/depression) related primary and secondary care records during and after pregnancy until Foundation Phase were obtained from WLGP and hospital admission dataset-Patient Episode database in Wales (PEDW).The Substance Misuse Database (SMD) was used to populate information on the mothers' alcohol or other substance abuse related record during the study period.Any coded READ and ICD10 codes related to substance misuse on WLGP and PEDW dataset were also considered in this study.Mothers' alcohol related hospital admission records were obtained from PEDW.Maternal smoking during and after pregnancy were obtained from WLGP, smoking related READ code mentioned on the dataset during the study period were considered to build the variable.The record of physical assault related hospital admissions of mothers during or post pregnancy was obtained from PEDW.The hospital admission and GP records of the children for epilepsy, asthma, diabetes, ear infections, and eye infections were considered as a measure of child health conditions.Any emergency hospital admission and any accident and emergency (A&E) attendance of the study population between birth and Foundation Phase were obtained from PEDW and Emergency Department dataset (EDDS).READ code version 2 and ICD10 codes have been used to identify the health records from WLGP and PEDW dataset (see Appendix 3 in S1 File).Any coded diagnosis of the above-mentioned physical and mental health conditions for mother and child during their GP visit (obtained from WLGP) or hospital admission (obtained from PEDW) were considered in this study.The children's age at the completion of their Foundation Phase and the total number of days they were absent in the school in early years (e.g., nursery) were obtained from Pre16 Education Attainment dataset.Household characteristics such as living in a single adult household, total number of adults, and total number of other children in the household were derived from the WDS dataset.In this study the eligibility for free school meals (FSM) during Foundation Phase was used to measure the family-level deprivation of the study population.The area-level deprivation was measured by the Welsh Index of Multiple Deprivation (WIMD) 2014 which provides a measure of the relative deprivation in Wales linked to LSOA [27].The local authorities and the type of local area (urban/rural) where the children were brought up during Foundation Phase were included in the study.

School readiness from routine data
The binary Foundation Phase Indicator variable was obtained from the Pre16 Education Attainment dataset and was used as a measure of school readiness from the routine data in the current study.The National Curriculum assess school readiness using the Foundation Phase Indicator at the end of early year foundation stage where the child would be at the age of 6 or 7.The Foundation Phase Indicator represents whether the child has achieved at least the expected level 5 or above in the early stage learning goals in the following areas;-i) personal and social development, well-being and cultural diversity, ii) language, literacy, and communication skills-English/Welsh and iii) mathematical development [28].In this study a binary variable has been derived based on the Foundation Phase Indicator record as a measure of school readiness from routine data.
• Low school readiness = Not achieved Foundation Phase

Statistical analysis
A multivariable logistic regression model was first developed to identify and weight the most important risk factors associated with school readiness.Next, we built a data driven machined learning classifier model using decision tree to investigate the most commonly observed risk factors at the population level.Since the children with learning difficulties or special educational needs tend to have a much higher risk of low school readiness, they were removed from the models.Data preparation including data linkage was performed on DB2 SQL platform and the statistical analysis was done in R version 4.0.3.
Logistic regression.To identify the most important risk factors associated with low school readiness we used multivariable logistic regression.Variables included gender, gestational age, birthweight, breastfeeding, caesarean section, multiple birth, maternal age, maternal death before Foundation Phase, maternal physical and mental health, child physical (epilepsy, asthma, diabetes, ear, and eye) and mental health conditions (depression, anxiety), free school meal uptake, local area status and number of adults and children living in the same household.The significant risk factors of low school readiness are presented with their adjusted Odds Ratio (aOR) and 95% confidence interval (CI).
Decision tree.A classification tree-decision tree algorithms were developed using RPART (Recursive Partitioning And Regression Trees) packages in R [29,30].The algorithm repeatedly partitions the data into multiple sub-spaces to reach the homogeneous end subspace, hence it is called recursive partitioning.For decision trees, the data for one representative local authority was removed from the dataset and used as the testing dataset to validate the model performance and examine generalisability within areas of Wales.

Overall sample characteristics
The study population consisted of 142,955 children (training dataset: 128,222, testing dataset: 14,733) who completed Foundation Phase between 2012 and 2018 in Wales (see Table 1).14.32% (Training dataset: 14.15%, Testing dataset: 15.75%) children did not achieve in Foundation Phase.The study population consisted of 51.24% boys, 42.87% were not breastfed and 24.83% were born via caesarean section.8.33% were born to mothers aged below 19, 0.14% of mothers had learning difficulties and 0.23% lost their mother before their Foundation Phase assessment.There were 0.1% mothers who had an alcohol related hospital admission, 0.36% with substance abuse and 14.63% had a smoking record in WLGP during pregnancy.0.64% of children had an epilepsy related GP visit, 0.46% had a hospital admission record for epilepsy.3.30% and 4.54% children were admitted to hospital for asthma and ear infection respectively before they completed Foundation Phase.56.37% of children had at least one emergency hospital admission and 66.4% had A&E records anytime between birth and Foundation Phase.0.90% of children (Training dataset: 0.88%, Testing dataset 1.07%) were diagnosed with a learning difficulty.22.05% of children were in single adult households, 19.57% were eligible for FSM and 25.66% lived in most deprived area measured by WIMD.Overall characteristics of the study population have been described in Table 1.

Result from decision tree
The training model consisted of 127,090 individuals who lived in Wales (excluding testing dataset).The most important variables in the model were: FSM, gender (boy), number of school absences, child's age while completing Foundation Phase, children with any emergency hospital admission, children with any A&E attendance, children with asthma, low birth weight, maternal substance misuse related GP record, maternal substance misuse related hospital admission, not being breastfed, children with ear problems and number of children in the household (higher number).The final decision tree model has been shown in Fig 2.Here are some case studies of the branches described in the decision tree model.
1. IF children are eligible for FSM (higher family level deprivation) -> Gender-Boys -> Total number of absent sessions more than 102 THEN the probability of Failed is 73% (terminal node 31) 2. IF children are not eligible for FSM (lower family level deprivation) -> Gender-Boys -> Younger in academic year ->Total number of absent sessions more than 82 THEN they are more likely to be Failed (terminal node 95).
3. IF children are eligible for FSM (higher family level deprivation) -> Gender-Girls -> Total number of absent sessions more than 84 THEN they are more likely to be Failed (terminal node 55).
4. IF children are eligible for FSM (higher family level deprivation) -> Gender-Boys -> Younger in academic year -> Low birth weight baby -> Not breastfed THEN they are more likely to be Failed (terminal node 119).

IF children are not eligible for FSM (lower family level deprivation) -> Gender-Girls
THEN they are more likely to be Achieved (terminal node 4).
6. IF children are not eligible for FSM (lower family level deprivation) -> Gender-Girls -> Total number of absent sessions more than 41 THEN they are more likely to be Achieved (terminal node 10).
There were 14,575 children in the testing dataset.The model performance has been explained with the help of a confusion matrix.The model achieves 85.21% accuracy, 4.94% sensitivity, 99.37% specificity, 58.06% positive predictive values and 85.56% negative predictive value and 15% prevalence (see Tables 3 and 4).

Discussion
This study investigated the risk factors associated with low school readiness and developed two holistic models on a national level routine data framework.Here the multivariable regression model helped to identify the risk factors with the highest association/Odds Ratio but might not be common or frequently observed on a population level, the decision tree on the other hand contributed to identify the most important and common/frequent risk factors.Infrequent but highly associated events/factors which affect a child's school readiness include if the mother has a learning disability (0.14%), the child has epilepsy (0.64%) or is born extremely low birth weight (1%).However, there were also factors which were both highly associated and common such as being a boy (51%), where the odds of not being school ready is 2.11 than a girl (aOR more than twice that of girls), family level deprivation (eligible for FSM) which includes 19.5% of children, doubles the risk that they will not be school ready (aOR: 1.85).Low school attendance in early years (e.g., nursery) is associated with being 2% less likely to be school ready for every day missed in nursery.The findings from our study suggest that rising poverty and the cost-of-living crisis are likely to result in lower school readiness and lower educational attainment.This will put a strain on school resources as children enter school [31].Children in family and area-level deprivation are at higher risk of not being school ready.This finding is consistent with the existing literature [7,15].Boys being disadvantaged compared to girls has been noted in other research [32].In fact, it is suggested that family instability (separation, divorce, second families) affects boys more than girls, with a lack of a male influence impacting on behavioural difficulties [32] and that recent population increases in family instability can help explain a trend in lower attainment for boys at all levels.In addition, existing research clearly demonstrates that deprivation is a strong predictor of low school readiness [9,33].Various indicators of deprivation such as parental employment, lower parental educational attainment, lower income, less time with the child, poorer play/local area facilities have been identified as significantly linked with low school readiness [7].Our findings such as the significant association between living in family level (eligibility for FSM) and area level (most deprived WIMD) deprivation and higher chance not to be school ready are along the similar lines reported in the literature [9,34,35].Hence, it is suggested that pre-school investment [35] and free childcare can overcome some of the risk factors associated with deprivation.
In this study, the decision tree model highlighted the risk factors which are clustered together e.g., boys living in household level deprivation and higher absences in school are at high risk of low school readiness.Similarly, girls who are often missing school are at risk of not being school ready.It also showed that children who are not breastfed, having ear infection and younger in academic year than their peers will more likely be not school ready.The branches of clustered risk factors are used to examine the determinants of low school readiness.These most significant risk factors can contribute to understand the profile of the  vulnerable children and their families and help to improve the decision making at policy level which will support children to overcome the odds and have a better start in life.This work has been developed as part of the Early Years Vulnerability Profiling Pilot.This will enable the Health Board and local authority to plan how the Early Years Vulnerability Profile can be used to inform better targeting of prevention and early intervention to children and their families up to the age of seven years to enable better outcomes for health, well-being, education, and social skills.The clustered risk factors can be used to understand what is associated with as determinants of low school readiness at population level.This can contribute to informed decision making at policy level that supports the children to have best start in life.

Strengths and limitations
This study is based on linked data for an entire country over a 6-year period.This provides a wider range of risk factors from routine administrative data at a national level which can be addressed to improve outcomes for children who are exposed to inequalities and disadvantages from early life.It can contribute to breaking a cycle of disadvantage for children by helping to identify where and how to target early years interventions designed to improve school readiness.There is evidence that routinely collected data observed during perinatal period can contribute to improve child's development at early years [15,21].A linked population level database can facilitate a holistic investigation of the complex factors associated with the low school readiness.Longitudinal data linkage allows the capturing of the developmental trajectory of the individual child from school foundation phase and health visitor records.Combing this with maternal physical and mental health records can only strengthen the power of the analysis, as it is proved that maternal health and wellbeing is one of the biggest predictors of child's development and wellbeing (Improving school readiness Creating a better start for London).If all these information can be available at an early stage to the policy makers from school and health visitors report, this can directly contribute to identify the most vulnerable children and their families at a very early stage and can help to build necessary intervention and support plans for them when it's most needed.
However, it can only examine factors which are recorded using routine data.Important factors such as parenting style, time spent with the child reading, playing, and interacting cannot be captured with this data but would be important factors associated with school readiness.Another limitation of the study is that it only included the children who were in Wales during the entire study period and were removed if the children moved out of Wales as we were unable to capture their exposure records.However, this would not lead to any selection issues (please see Appendix 5 in S1 File) since these were arbitrary and independent events and does not lead to or is not linked with low school readiness.The study has identified a cluster of socioeconomic, health and household level risk factors leading to low school readiness and establishing a direct causal pathway of the modifiable risk factors is beyond the scope of the study.
A major strength of the study is that it incorporated data from birth till they enter their formal school to build the model to identify the risk factors of low school readiness, hence these findings can be helpful to identify the children at risk of low school readiness before they start their schooling as many of these factors are present in the first years of life (gender, deprivation, gestational age, parental health) and so those at risk can be supported through access to childcare, parenting support and supporting breastfeeding.In addition, the school readiness for local children coming to a school can be predicted and this means schools can have the necessary resources in place to help the specific catchment of children coming to their school.

Conclusion
This study highlighted a vulnerability profile of the children who are at higher risk of low school readiness by identifying the group of risk factors which are clustered together.The findings suggest that earlier intervention (access to childcare, mother/baby groups, community activities, parenting interventions) could help to improve the outcomes for children who are at a high risk of low school readiness.This is especially true in deprived areas with low access to childcare and where there are child or adult health problems.It has been observed that intervention programmes like Flying Start has positive effects on the children living in deprivation including improved school attendance and better educational outcomes than their peers who are in similar condition but not under Flying Start programme [36].This work suggests that interventions which focused on boys in deprived areas, encourage or facilitated attendance in nursery in the early years, investment in early years childcare and promoting breastfeeding would have a significant impact on school readiness.Interventions such as parenting programmes which supported families with parental learning difficulties, support when there is parental or child illness (e.g., community tutoring volunteer programmes) especially for epilepsy would make a significant difference for the child's readiness for school.This could positively influence a child's life trajectory by strengthening foundations for lifelong learning, improving health and wellbeing outcomes throughout the life-course, and reducing education and developmental inequalities that persist.